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Abstract 

Consider X±, X 2 , . . . , X n that are independent and identically N(/x, a 2 ) distributed. 
Suppose that we have uncertain prior information that \i = 0. We answer the question: to 
what extent can a frequentist 1 — a confidence interval for \x utilize this prior information? 
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1. Introduction 

Suppose that X 1; . . . ,X n are independent and identically N(fi,a 2 ) distributed. The 
parameter of interest is /i. Also suppose that previous experience with similar data 
sets and/or scientific background and expert opinion suggest that \x = a, where a is a 
specified number. Without loss of generality we assume that a = 0. Our aim is to answer 
the following question. To what extent can a frequentist 1 — a confidence interval (i.e. 
a confidence interval whose coverage probability has infimum 1 — a) utilize this prior 
information? 

For the sake of simplicity, we first deal with the case that a 2 is known. We find 
a confidence interval for \i by first finding a confidence interval for 9 = ^Jnn/a. Let 
X = n^YJUXi and X = X/(°/Vn)> so that x ~ N(6,l). Suppose that / = 
[£(X),«(X)] is a 1 — a confidence interval for 9 i.e. P e (9 e /) > 1 — a for all 9. 
The confidence interval for \x that corresponds to this confidence interval for 9 is J = 
[(a/y/n)e(X), (a/y/n)u(X)]. Pratt (1961, 1963) considers X ~ N(0,1) and presents 
confidence intervals for 9 that (a) have a pre-specified minimum coverage 1 — a and (b) 
are shorter than the usual confidence interval when 9 = 0. The 1 — a confidence interval 
for /i that has the smallest possible expected length when /i — is derived by Pratt 
(1961) and is 



where z a is defined by P(Z > z a ) = a for Z ~ N(0, 1). 

This confidence interval for \i has the following analogue for the case that a 2 is 
unknown 



where t a>m is defined by P(T > t a , m ) = a for T ~ t a , m and S 2 = (n — 1) 1 Y^7=i( X i~ X) 2 . 
This analogue is given by Brown et al (1995) and has been described e.g. by Bofinger 
(1985). 

These confidence intervals have two major problems. The first problem is that the 
expected lengths of these confidence intervals diverge to oo as |//| — > oo. This unpleasant 
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feature means that if the prior information happens to be badly incorrect (i.e. \i happens 
to be very far from 0) then these confidence intervals perform extremely poorly. The 
second problem is that neither of these confidence intervals approaches the corresponding 
standard confidence interval when the data strongly indicates that the prior information 
about n is incorrect. Surely, if the data strongly indicate that this prior information is 
incorrect then we should be using something very close to the standard 1 — a confidence 
interval for /x. For example, when a 2 is known and \X\ > 10 then we should be using 
the standard confidence interval \X — z a /2(cr / y/n) , X + z a /2(a/\/nj\ for \i. 

In this paper we describe confidence intervals for fi that do not suffer from these 
problems. Similarly to Hodges and Lehmann (1952) and Bickel (1983, 1984), our aim is 
to utilize the uncertain prior information in the frequentist inference of interest, whilst 
providing a safeguard in case this prior information happens to be incorrect. Our 1 — a 
confidence intervals have the following desirable properties. They have expected lengths 
that (a) are relatively small when the prior information that fi = is correct and (b) 
have a maximum value that is not too large. They also coincide with the corresponding 
standard 1 — a confidence interval when the data happens to strongly contradict the 
prior information about fi. In Sections 2, 3 and 4 we deal with the case that a 2 is known, 
by applying the methodology of Pratt (1961) with a novel weight function. In Sections 
5 and 6 we use the same novel weight function, combined with invariance and a new 
computationally-based approach, to deal with the case that a 2 is unknown. 

Finally, consider point and interval estimators utilizing uncertain prior information in 
linear regression. Bickel (1984) presents a comprehensive analysis of point estimators in a 
very general setting. He also analyzes the coverage properties of some fixed-width confi- 
dence intervals, assuming the covariance matrix of the error vector is known. Tuck (2006) 
develops a new variable- width confidence interval analogous to (fl]). The methodology 
described in Sections 5 and 6 of the present paper, leading to variable-width confidence 
intervals, can be extended to the linear regression context (Kabaila and Giri (2007)). 

2. The known variance case 

Assume that a 2 is known. In the introduction we defined the random variable X, 
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which has an N(9, 1) distribution, and the 1 — a confidence intervals I and J for 9 and 
fi respectively. Observe that -P M (/i G I) = Pg(9 G I), so that the confidence intervals / 
and J have the same minimum coverage probabilities. Furthermore, ^(length of J) = 
(a / ^/n) E e (length of J) when 9 = y/nfi/a. In other words, the expected length of J is 
proportional to the expected length of I. We therefore focus on the case that X has an 
N(8, 1) distribution and we have uncertain prior information that 9 = 0. 

Let C(X) be a 1 — a confidence region for 9. Define A(9) by 9 G C(sc) if and only if 
x G Here is the acceptance region for the null hypothesis that 9 is the true 

parameter value. Let L(C(X)) denote the sum of the lengths of the intervals making up 
C(X). Also let our aim be to minimise the average expected length 

J E 9 (L{C{X)))dv{9). (3) 

for a specified weight function v. We use <fi to denote the N(0, 1) density function. As 
proved by Pratt (1961), the solution to this problem is to choose the acceptance region 
A(9) to consist of those values of x such that 

<f>(x - 9)dv{9) 

< c a (9) 



<t>[x - 9) 

where c a (9) is chosen such that Pg(X G A{9)) = 1 — a. For some weight functions v the 
average expected length is infinite, even for the confidence interval corresponding to the 
acceptance regions obtained in this way. However, the criterion 

(E e (L(C(X))) -2z a/2 )du(9) 

takes the (finite) value when C(X) is the standard 1 — a confidence interval for 9. It is 
straightforward to show that the minimization of this criterion leads to the same formula 
for A(9) as the (formal) minimization of ([3]). 

As pointed out by Pratt (1961), the standard 1 — a confidence interval for 9, 

[X - z a/2 , X + z a/2 ] , (4) 

is the I — a confidence interval that minimizes the average expected length when v{x) = x 
for all x. Also, as pointed out by Pratt (1961), the 1 — a confidence interval ([1]) for 9 is 
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the 1 — a confidence interval that minimizes the average expected length when v — H 
where H is the unit step function defined by H(x) = for x < and H(x) = 1 for x > 0. 

Now consider a weight function that is a mixture of the weight functions u(x) = x 
and v = H . It is plausible that if we minimise the average expected length using this 
weight function then we will obtain a confidence interval that (a) has relatively small 
expected length when 9 = and (b) overcomes the weaknesses of Pratt's interval (pQ). 
So, we consider the 1 — a confidence interval that minimises the average expected length 
when the weight function v is 

u(x) = wx + H(x) for all x. (5) 

Here, w is a fixed nonnegative number. We call this the 'mixed interval'. 

In this case, the acceptance region A(8) corresponding to the confidence region C(X) 
minimizing the average expected length ([3]) consists of the values of x such that 

w + <t>{x) 

where c a {6) is chosen such that Pq(X e A(9)) = 1 — a. Define 

Also define B(c,9) = {x : g(x,c,9) < 0}. Obviously, c a {6) is the value of c such that 
P$(X G B(c,9)) = 1 — a. To analyse the properties of the acceptance region A{6) we 
will need the following theorem. 

Theorem 2.1. For every fixed w > 0, 9 G M and c > 0, the following is true. The set 
B(c, 9) is either (a) the empty set or (b) an interval with finite endpoints. 

Proof. Fix w > 0, 9 G K and c > 0. Observe that g(x,c,6) — » oo as \x\ — > oo. The 
result will be proved by showing that dg(x,c,9)/dx is an increasing function of x G M. 
Now g(x, c, 8) = exp(^9 2 )g*(x, 9) — c, where g*(x, 9) = w* exp(^x 2 — 9x) + exp(— 9x) and 
w* = \phxw. Note that 

dg*(x, 9) = exp (_ig 2 ) w *( x _ g) exp(|(x - 9f) - 9exp(-9x). 
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This is an increasing function of x. Hence dg(x, c, 8)/dx is an increasing function of x. 

□ 

This theorem leads to the very important property of A(9) described in the following 
corollary, whose proof is omitted for the sake of brevity. 

Corollary 2.1. For every fixed w > and 9 G R, the following is true. The 1 — a 
acceptance region A{9) is an interval with finite endpoints. 

The computation of the acceptance region A{9) for given w > and 9 G R is facilitated 
by the following result. 

Theorem 2.2. For every w > and 9 G R, 

wVZk exp(~zl /2 ) < c a (9) < (wV^tt + 1) exp(±;z£ /2 ). 

Proof. The result follows from the fact that for every w > and 9 G R the following is 
true. For every x G R, 

w w + 4>(x) w + (1/v2tt) 

(f)(x - 9) ~ <f)(x - 9) ~ <j){x - 9) ' 

□ 

The following theorem describes an important property of the confidence set C(x). The 
proof of this theorem is omitted, for the sake of brevity. 

Theorem 2.3. For every w > the following is true. The 1 — a confidence set C(x) is 
an interval for all sufficiently large \x\, with endpoints approaching those of the standard 
1 — a confidence interval [x — z a / 2 ,x + z a /2\ as \x\ — > oo. 

3. Numerical comparison of the intervals for the known variance case 

We continue with our consideration of the case that a 2 is known. As described in the 
introduction, we reduce this case to the problem of finding a 1 — a confidence interval 
for 9 based on X ~ N(9, 1). We denote the standard 1 — a confidence interval (HI) by 
C S (X). Also, we denote Pratt's interval (CQ) by C P (X). 

Consider the case that the weight function v is given by (J5]). This weight function 
is a mixture of the weight functions v{x) = x and v = H that lead to C$ and Cp 
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respectively. For 1 — a = 0.95 and each w e {0.01, 0.1, 1}, Corollary 2.1 and Theorem 2.2 
were used to compute the acceptance regions A(9), corresponding to the 0.95 confidence 
sets minimizing the average expected length ([3]), for a fine grid of values of 9. For each of 
these values of w, the confidence region corresponding to A{9) was found to always be an 
interval. We denote the 0.95 confidence interval minimizing the average expected length 
when v is given by © (which we have called the mixed interval) by C^(X). All the 
computations for the present paper were performed with programs written in MATLAB, 
using the Optimization and Statistics toolboxes. 

We use Cs as the standard against with other 1 — a confidence intervals for 9 are 
judged. The efficiency of Cs relative to C, a 1 — a confidence interval for 9, for a given 
value of 9 is defined to be 



Let X = (X\, X2, . . . , X n ). Note that a 1 — a confidence interval C(X) for 9 corresponds 
to a 1 — a confidence interval D(X) for /1 that is obtained by multiplying the endpoints 
of C(X) by a / ' \fn. Let Ds(X) denote the standard 1 — a confidence interval for fi. We 
define the efficiency of D s relative to D as (E(L(D(X)))/E(L(D S (X)))) 2 and note that 
this is the same function of 9 as e{9, Cs, C). 

Figure 1 shows plots of the efficiency of Cs relative to C^ for w = 1, w = 0.1, 
w = 0.01 and w — 0, when 1 — a = 0.95. Note that Pratt's interval Cp is equal to the 
mixed interval C^ for w = 0. Also, the standard interval Cs may be viewed as the mixed 
interval C^ in the limiting case w —>■ 00. Even for w — 1, which is not a particularly 
large value of w, C^ is fairly close to Cs- 

The minimum over all 1 — a confidence intervals C of e(0, Cs, C) is 0.7223 and this 
minimum is achieved by Pratt's interval Cp. However, as noted in the introduction, this 
interval suffers the severe problems that (a) e(9,Cs,Cp) — » 00 as \9\ — » 00 and (b) it 
does not revert to the standard interval when \X\ — > 00. The mixed interval C^ { with 
w = 0.1 is far preferable. For this interval, e(0, Cs, C^) = 0.8016, which is not that much 
larger than 0.7223. Also, for this interval, e(9, Cs, C^) never exceeds 1.2095. Finally, 
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in accordance with Theorem 2.3, this interval approaches the standard interval Cs as 
\X\ — > oo. Of course, the value of w can be chosen so as to reflect the strength of the 
prior information that 6 = 0. 
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Figure 1: Plots of the efficiency e(9, Cs, C%) of the standard interval Cs relative to the 
mixed interval C^ for w = 1, w = 0.1, w = 0.01 and w = when 1 — a = 0.95. 

4. Invariance properties of the confidence interval in the known variance case 

In this section we first describe the invariance properties that we expect the confidence 
interval J (defined in the introduction) to possess. Traditional invariance arguments (see 
e.g. Casella and Berger (2002, section 6.4) do not include considerations of the available 
prior information. The novelty in the present section is that the invariance arguments 
need to take proper account of the uncertain prior information that fi = 0. 

We first describe an invariance property that J already possesses. The model that 
Xx, . . . , X n are independent and identically N((jl, a 2 ) distributed may be re-expressed as 
follows. Define Yj = aXi for % = 1, . . . , n where a > 0. Thus Y±, . . . , Y n are independent 
and identically N(fl,cr 2 ) distributed where \x = afi and a = ao. Define Y = Y jiaj^fn). 
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The prior information \x = may be re-expressed as jl — 0. The re-expressed model and 
prior information have the same form as the original model and prior information. Thus 
the confidence interval J = [(or / 'y/n)£(Y) , {a/y/n)u{Y)\ for jl must lead to a confidence 
interval for \i that is identical to J. It is easily seen that this requirement is satisfied. 

Next, we describe an invariance property that J should possess and conclude from this 
that the equality £{x) = —u(—x) should hold for all 16K. The model that X l5 . . . , X n 
are independent and identically N(fjL, a 2 ) distributed may be re-expressed as follows. 
Define Yi = —Xi for i = l,...,n. Thus Yy, . . . ,Y n are independent and identically 
N(jl,a 2 ) distributed where jl = —fi. Define Y = Y ' jioj^fn). The prior information 
fi = may be re-expressed as jl — 0. The re-expressed model and prior information 
have the same form as the original model and prior information. Thus the confidence 
interval J = [(a / 'y/n)£(Y), (a/y/n)u(Y)~j for jl must lead to a confidence interval for /i 
that is identical to J. It is easily seen that this requirement is satisfied if and only if 
£(x) = —u(—x) for all Note that both Pratt's interval (CQ) and the mixed interval 

defined in Section 2 satisfy this requirement. 



5. Invariance arguments in the unknown variance case 

We now consider the case that a 2 is unknown. This is the case that is usually 
encountered in practice. The standard 1 — a confidence interval for fi is 



— S — S 

X — t a / 2l n-l—/=, X +t a /2,n-l~/= 



A natural analogue of the confidence interval J for fi is the confidence interval 



(6) 



K 



s ( x \ s / X \ 



for ji. Note that both (jHJ) and (j2J) have this form. Suppose that our uncertain prior 
information is that fi = 0. Using the same model transformations as in Section 4, 
invariance arguments show that the equality a(x) = —b(—x) must hold for all x G R. In 
other words, 
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6 G/v^)'v^K<?/v^) 
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The constraint that the upper endpoint of this confidence interval is never less than the 
lower endpoint implies that b(x) > — b{— x) for all x G R. It also seems reasonable to 
require that b is a strictly increasing function. 



6. Computation of the interval for the unknown variance case 

In this section we provide computationally convenient expressions that are used to 
calculate the mixed interval for the unknown variance case. We illustrate the performance 
of this interval and compare its performance with the corresponding mixed interval when 
a 2 is known for the case that n = 24 and 1 — a = 0.95. 

Suppose that a 2 is unknown. As in Section 3, let X = (Xi,X 2 , . . . ,X n ). Also let 
G(X) be a confidence interval for [i that is of the form (|7j). Our aim is to minimize the 
average expected length of G(X) for a given weight function z/, such that the coverage 
probability of G(X) is at least 1 — a for all /i 6 1. Let 9 = ^fn^ja. As we show 
shortly, the coverage probability P(fi G G(X)) is a function of 9. The expected length 
of G(X) is a function of (/i,ct 2 ). However, we will introduce a scaled expected length 
of G(X) which is a function of 9. By using this scaled expected length, instead of the 
expected length, we will be able to achieve our aim by considering only quantities that 
are functions of 9 (cf. Kabaila (1998, 2005)). 

The coverage probability of G(X) is a function of 9 and is given by 

P(/i G G(X)) = P (-Rb (jp) <0<Rb(jX) (8) 

where X = ^JnX|o ~ N(6 1 , 1) and R = S/a. Note that X and R are independent 
random variables. We assume that b is a strictly increasing function. This implies that 
b^ 1 exists. A computationally convenient expression for the right hand side of (Ej) is 

($ (-rb' 1 (^pj -e\-$ (rb' 1 (?\ -6j \ f R (r)dr (9) 

where $ is the N(0,1) cumulative distribution function and Jr denotes the density of R. 

We introduce the scaled expected length of G(X) which is a function of 9 and is 
defined to be 



^E (£ < G (X))> = E(*(6g) + »(^ 
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A computationally convenient expression for this scaled expected length is 

b (r ) + b \7J ) ^ X ~ ^ dX r f R ^ dr - ^ 

We use the weight function As with the a 2 known case, for w > 0, the average 
scaled expected length criterion is infinite even for the standard confidence interval 
Therefore, similarly to Section 2, we replace this criterion by the following criterion 

J f^E (L(G(X))) - 2t a/2jn _ 1 E(R)\ dv{B) (11) 

which takes the (finite) value when G(X) is the standard confidence interval ([6]). 
Substituting the expression ( jTUl) for the scaled expected length into ( TTTj) we obtain 

b (f) + b ( ~7J ~ 2t °/ 2 ' n - 1 ) ( w + dx r f^ r ) dr - ( 12 ) 

Remember that we require the confidence interval to coincide with the standard 1 — a 
confidence interval when the data happens to strongly contradict the prior information 
about /i. The statistic y/nX/S provides an indication of how far y/nfi/a is from 0. 
We therefore satisfy this requirement by setting b(y) = y + t a / 2tn ~i for all \y\ > q 
where q is a specified positive number (which is chosen to be sufficiently large). Thus 
b(x/r) + b{—x/r) — 2t a / 2 ,n-i = for all \x\/r > q. Changing the variable of integration 
from x to y = x/r, f fT2i) can now be expressed in the computationally convenient form 

Q (b(y) + b(-y) - 2* a/2 , n _ 1 ) (w + <j>{ry)) dy r 2 f R (r)dr. (13) 



For computational ease, we restrict the function b(y) to be a cubic spline in the interval 
[—q, q}. This spline is required to take the value — q + t a /2,n-i at y = — q and q + t a / 2 , n -i 
at y = q and has knots that are equally spaced between — q and q. In addition, the 
derivative of this spline is constrained to be 1 at both y = —q and y = q. 

We minimize ( 1T3|) with respect to the function b, subject to the constraints on b 
described at the end of Section 5 and the constraint that is at least 1 — a for all 
^GK. We denote the minimizing function b by b^. We call the confidence interval for 
jj, corresponding to b^ the mixed interval and denote it by G^. We denote the standard 
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confidence interval (jSJ) by G$- Similarly to Section 3, we use Gs as the standard against 
with other I — a confidence intervals for // are judged. The efficiency of Gs relative to G, 
a 1 — a confidence interval for fi, is defined to be (E(L(G(X))} / E (L(G s(X)))) 2 which 
is a function of 9. 

To illustrate the performance of the mixed interval and to compare its performance 
with the corresponding mixed interval when a 2 is known, we consider the case that 
n = 24 and 1 — a = 0.95. For the computation of G^, we chose q = 8 with the knots of 
the cubic spline at —8, —7, . . . , 7, 8. We also chose w = 0.1. The efficiency of Gs relative 
to G^ is shown in the right panel of Figure 2. When the prior information is correct i.e. 
/i = the efficiency of Gs relative to G^ is 0.8013. Also, the efficiency of Gs relative 
to G%[ never exceeds 1.1930. Furthermore, Gf^ reverts to the standard 1 — a confidence 
interval when the prior information happens to be badly incorrect. This is reflected in 
the fact that the efficiency of Gs relative to G^ approaches 1 as \9\ — > oo. It is notable 
that the coverage probability of the confidence interval was found to be 0.95 to an 
extremely good approximation throughout the parameter space. Now n = 24 is quite 
large and so S will be probabilistically close to a. We therefore expect that the efficiency 
of Gs relative to G^ to be similar to the efficiency of Ds relative to when w = 0.1. 
This expectation is confirmed by the left panel of Figure 2. 



Mixed Interval (a 2 known) Mixed Interval (it 2 unknown) 
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Figure 2: These plots concern the case that n = 24, 1 — a = 0.95 and w = 0.1. The left 
panel is a plot of the efficiency of D s relative to as a function of 9. The right panel 
is a plot of the efficiency of Gs relative to as a function of 9. 
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